Accurate sales forecasting is essential for retail planning, inventory optimization, and supply chain management. Traditional statistical models often struggle with nonlinear patterns and complex seasonality, while standalone machine learning approaches may ignore inherent temporal structures. This study proposes a hybrid weekly retail sales forecasting framework that integrates Facebook Prophet and XGBoost. Prophet is first employed to model trend, seasonality, and holiday effects, producing baseline forecasts and residual errors. An XGBoost regression model is then used to capture nonlinear residual components through engineered temporal features, including lag variables, rolling statistics, growth rates, and holiday indicators. The final prediction is obtained by combining Prophet forecasts with XGBoost-corrected residuals. Model performance is evaluated using MAE, RMSE, and MAPE metrics. Experimental results demonstrate that the proposed hybrid Prophet–XGBoost model significantly improves forecasting accuracy compared to the standalone Prophet approach, providing a scalable, interpretable, and data-driven solution for retail sales prediction.
Introduction
Accurate sales forecasting is crucial in retail because it directly affects inventory management, supply chain efficiency, workforce planning, and profitability. Retail sales data are influenced by seasonality, holidays, promotions, and market changes, making forecasting challenging. Traditional statistical models often fail to capture nonlinear demand fluctuations, while standalone machine learning models may overlook important temporal patterns such as trends and seasonality.
Purpose of the Study
This research proposes a hybrid forecasting framework that combines:
Prophet for modeling trends, seasonality, and holiday effects.
XGBoost for learning and correcting nonlinear residual errors.
The goal is to improve the accuracy, robustness, and interpretability of weekly retail sales forecasts.
Motivation
Retail organizations need reliable forecasts to avoid:
Overstocking
Stock shortages
Increased operational costs
Revenue loss
While Prophet effectively captures time-series patterns, it cannot fully model complex nonlinear variations. XGBoost can capture these variations but lacks temporal interpretability. Combining both models helps overcome these limitations.
Objectives
The study aims to:
Preprocess and aggregate weekly retail sales data.
Model temporal patterns using Prophet.
Predict residual errors using XGBoost with engineered features.
Generate hybrid forecasts by combining both models.
Evaluate performance using:
Mean Absolute Error (MAE)
Root Mean Square Error (RMSE)
Mean Absolute Percentage Error (MAPE)
Produce future forecasts with confidence intervals for decision-making.
Problem Statement
Existing forecasting methods face several challenges:
Statistical models struggle with nonlinear fluctuations.
Machine learning models may ignore seasonality and long-term trends.
Most approaches rely on a single forecasting technique.
The study addresses the need for a framework that:
Models trend and seasonality accurately.
Captures nonlinear demand behavior.
Reduces prediction errors.
Provides interpretable and reliable forecasts.
Literature Review Findings
Previous studies have used:
Machine learning models such as Decision Trees, Random Forests, and Neural Networks.
Hybrid models like ARIMA-BIGRU.
Statistical models including SARIMAX, ARIMA, and Prophet.
Ensemble regression methods.
However, most studies either focus solely on statistical approaches or machine learning models. Few integrate time-series decomposition with gradient-boosting-based residual correction, creating a research gap that this study addresses.
Methodology
1. Data Preparation
The study uses the Walmart weekly sales dataset:
6,435 records from 45 stores.
Data period: February 2010 – October 2012.
Weekly aggregation resulted in 143 observations.
Training data: 122 weeks (85%).
Testing data: 21 weeks (15%).
Preprocessing steps included:
Date conversion and sorting.
Weekly aggregation.
Missing value handling.
Outlier removal.
2. Prophet Forecasting
Prophet models:
Long-term trends.
Seasonal patterns.
Holiday effects.
It generates baseline forecasts and confidence intervals.
3. Residual Error Calculation
Residuals are computed as:
Residual = Actual Sales − Prophet Forecast
These residuals represent unexplained nonlinear variations caused by factors such as promotions and market changes.
4. XGBoost Residual Learning
XGBoost predicts residual errors using engineered features such as:
Lag sales values.
Rolling averages.
Rolling standard deviation.
Sales growth rate.
Holiday indicators.
Week and month information.
This enables the model to learn short-term dependencies and nonlinear demand patterns.
5. Hybrid Forecast Generation
The final prediction is:
Final Forecast = Prophet Forecast + Predicted Residual
This combines the strengths of both models for improved forecasting performance.
Expected Benefits
The hybrid framework:
Captures both temporal and nonlinear sales patterns.
Improves forecast accuracy.
Enhances robustness against demand fluctuations.
Provides interpretable results for retail planning and decision-making.
Conclusion
The experimental evaluation shows that the hybrid Prophet–XGBoost framework significantly improves forecasting accuracy compared with the standalone Prophet model. The hybrid approach reduces MAE, RMSE, and MAPE errors by effectively combining structured time-series decomposition with nonlinear residual learning. This integrated forecasting strategy provides reliable predictions for weekly retail sales and offers valuable insights for inventory planning, supply chain optimization, and data-driven retail decision-making.
References
[1] Xu, R., A Method for Wal-Mart Sales Forecasting Based on Machine Learning, 2024 International Conference on Cloud Computing and Big Data (ICCBD 2024), 6 pages, 2024, ACM ISBN 979-8-4007-1022-3, https://dl.acm.org/doi/10.1145/3695080.3695141
[2] Tianyu Wang, Xiantao Jiang, ARIMA-BIGRU Stock Forecast Model Based on Bayesian Optimization, Proceedings of the 2025 3rd International Conference on Communication Networks and Machine Learning (CNML 2025), 314–319, 2025, ACM ISBN 979-8-4007-1323-1, https://dl.acm.org/doi/10.1145/3728199.3728219
[3] Chandra Shekhar Ram, Manish Raja, Rajnish Kumar Chaturvedi, Boosting Time-Series Forecasting Accuracy with SARIMAX Seasonal Interval Automation, Procedia Computer Science, 814–821, Vol. 260, 2025, ISSN: 1877-0509, 10.1016/j.procs.2025.03.262/ https://dl.acm.org/doi/10.1016/j.procs.2025.03.262
[4] Dmitry Brykin, Sales Forecasting Models: Comparison between ARIMA, LSTM and Prophet, Journal of Computer Science, 1222-1230, Vol. 20, 2024, DOI: 10.3844/jcssp.2024.1222.1230/ https://doi.org/10.3844/jcssp.2024.1222.1230.
[5] Mahin, M.P.R., Shahriar, M., Das, R.R., Roy, A., Reza, A.W., 2025. Enhancing sustainable supply chain forecasting using machine learning for sales prediction. Procedia Computer Science 252, 470–479. https://doi.org/10.1016/j.procs.2025.01.006
[6] Kong, X., Chen, Z., Liu, W., Ning, K., Zhang, L., Marier, S. M., Liu, Y., Chen, Y., & Xia, F. (2025). Deep learning for time series forecasting: a survey. International Journal of Machine Learning and Cybernetics, 5079–5112, 16, 2025, https://doi.org/10.1007/s13042-025-02560-w
[7] Md Kamal Ahmed, Md Ekrim Hossin, Mohammad Muzahidur Rahman Bhuiyan, Sazzat Hossain, Fahmida Binte Khair, Shafaete Hossain, Mia Md Tofayel Gonee Manik, Forecasting Sales Trends Using Time Series Analysis: A Comparative Study Of Traditional And Machine Learning Models, Membrane Technology, 668-682, Vol: 2025, 2025, ISSN (online): 1873-4049, https://www.researchgate.net/publication/389411074